#### DESCRIPTION

### MOTION COMPENSATION METHOD

### **Technical Field**

5

15

20

25

30

The present invention relates to a motion compensation method for interpolating sub-pixels into a reference picture and for performing motion compensation based on the interpolated reference picture.

## 10 Background Art

Moving pictures are being adopted in an increasingly number of applications ranging form video telephony and video conferencing to DVD and digital television. When moving pictures are transmitted, a substantial amount of data has to be sent through conventional transmission channels of a limited available frequency bandwidth. In order to transmit the digital data through the limited channel bandwidth, it is inevitable to compress or reduce the volume of the transmission data.

In order to enable inter-operability between systems designed by different manufactures of any given application, video-coding standards have been developed for compressing the amount of video data. The coding approach underlying most of these standards consist of the following main steps:

- (1) Dividing each video frame into blocks of pixels so that processing of the video frame can be conducted at a block level;
- (2) Reducing spatial redundancies within a video frame by subjecting video data of each block to transform, quantization and entropy coding;
- (3) Exploiting temporal dependencies between blocks of subsequent frames in order to only transmit differentials between subsequent frames.

Temporal dependencies between blocks of subsequent frames

are determined by employing a motion estimation and compensation technique. For any given block, a search is performed in previously coded and transmitted frames to determine a motion vector which will be used by the coding apparatus and the decoding apparatus to predict the image data of a block.

5

10

15

20

25

30

An example configuration of a video coding apparatus is illustrated in FIG. 1. The shown video coding apparatus generally with 900 denoted reference numeral includes: transform/quantization unit 920 to output quantized transform coefficients QC by transforming spatial image data to the frequency domain and quantizing the transformed image data; an entropy coding unit 990 for performing entropy coding (variable length coding) of the quantized transform coefficients QC and outputting the bit stream BS; and a video buffer (not shown) for adopting the compressed video data having a variable bit rate to a transmission channel which may have a fixed bit rate.

The coding apparatus shown in FIG. 1 employs a DPCM Pulse Code Modulation) by only transmitting (Differential differentials between subsequent fields or frames. A subtractor 910 obtains these differentials by receiving the video data to be coded as an input signal IS and subtracting the previous image indicated by a prediction signal PS therefrom. The previous image is obtained by decoding the previously coded image. This is accomplished by a decoding apparatus which is incorporated into video coding apparatus 900. The decoding apparatus performs the coding steps in a reverse manner. More specifically, the decoding apparatus includes: an inverse quantization/transform unit 930, and an adder 935 for adding the decoded differential (differential decoding signal DDS) to the previously decoded picture (prediction signal PS) in order to produce the image as will be obtained on the decoding side.

In motion compensated DPCM, a current frame or field is

predicted from image data of a previous frame or field based on an estimation of the motion between the current and the previous images. Such estimated motion may be described in terms of 2-dimensional motion vectors representing the displacement of pixels between the previous and the current images. Usually, motion estimation is performed on a block-by-block basis. An example of the division of the current image into plurality of blocks is illustrated in FIG. 2.

5

10

15

20

25

30

During motion estimation, a block of a current frame is compared with blocks in previous frames until a best match is determined. Based on the comparison results, an inter-frame displacement vector for the whole block can be estimated for the current frame. For this purpose, a motion estimation unit 970 is incorporated into the coding apparatus together with the corresponding motion compensation unit 960 included into the decoding path.

The video coding apparatus 900 of FIG. 1 performs operations as follows. A given video image indicated by an input signal IS is divided into a number of small blocks, usually denoted as "macro blocks". For example, video image shown in FIG. 2 is divided into a plurality of macro blocks, each of which usually having a size of 16 × 16 pixels.

When coding the video data of an image by only reducing spatial redundancies within the image, the resulting frame is referred to as an I-picture. I-pictures are typically coded by directly applying the transform to the macro blocks of a frame. I-pictures are large in size as no temporal information is exploited to reduce the amount of data.

In order to take advantage of temporal redundancies that exist between successive images, a prediction coding between subsequent fields or frames is performed based on motion estimation and compensation. When a selected reference frame in

motion estimation is a previously coded frame, the frame to be coded is referred to as a P-picture. In case both, a previously coded frame and a future frame are chosen as reference frames, the frame to be coded is referred to as B-picture.

5

10

15

20

25

30

Although the motion compensation has been described to be based on a  $16 \times 16$  macro block, motion estimation and compensation can be performed using a number of different block sizes. Individual motion vectors may be determined for blocks having  $4 \times 4$ ,  $4 \times 8$ ,  $8 \times 4$ ,  $8 \times 8$ ,  $8 \times 16$ ,  $16 \times 8$ , or  $16 \times 16$  pixels. The provision of small motion compensation blocks improves the ability to handle fine motion details.

Based on the results of the motion estimation operation, the motion compensation operation provides a prediction based on the determined motion vector. The information contained in a prediction error block resulting from the predicted block is then transformed into transform coefficients in transform/quantization unit 920. Generally, a 2-dimensional DCT (Discrete Cosine Transform) is employed. The resulting transform coefficients are quantized and finally entropy coded (VLC) in entropy coding unit 990.

A decoding apparatus receives the transmitted bit stream BS of compressed video data and reproduces a sequence of coded video images based on the received data. The configuration of the decoding apparatus corresponds to that of the decoding apparatus included in the coding apparatus shown in FIG. 1. A detailed description of the configuration of the decoding apparatus is therefore omitted.

In order to improve the accuracy of motion compensation, a sub-pixel accuracy of reference frames is widely used. For example, 1/2 sub-pixel accuracy motion compensation is used in the MPEG-2 format.

- 4 -

In order to further increase the motion vector accuracy and

coding efficiency, a 1/3 and a 1/6 sub-pixel vector accuracies have been proposed in Patent Literature EP 1 073 276.

The motion vector accuracy and coding efficiency can further be increased by applying interpolation filters in motion estimation and compensation yielding 1/8 sub-pixel displacements. However, such a sub-pixel resolution requires high computation complexity, in particular, calculation registers having a length of up to 25 bits.

5

10

15

20

25

30

Such a complex implementation may be based on a 2-step approach. In the first step a 1/4 sub-pixel image employing an 8-tap filter is calculated. In second step a 1/8 sub-pixel is obtained based on the 1/4 sub-pixel image by employing a bilinear filtering.

The filtering operation for generating the image with the 1/4 sub-pixel accuracy includes the steps of horizontal and subsequent vertical filtering. The horizontal interpolation may be performed based on the following Equations (1) to (3):

$$h_1 = -3 \cdot A_4 + 12 \cdot B_4 - 37 \cdot C_4 + 229 \cdot D_4 + 71 \cdot E_4 - 21 \cdot F_4 + 6 \cdot G_4 - 1 \cdot H_4 \dots (1)$$

$$h_2 = -3 \cdot A_4 + 12 \cdot B_4 - 39 \cdot C_4 + 158 \cdot D_4 + 158 \cdot E_4 - 39 \cdot F_4 + 12 \cdot G_4 - 3 \cdot H_4 \dots (2)$$

$$h_3 = -1 \cdot A_4 + 6 \cdot B_4 - 21 \cdot C_4 + 71 \cdot D_4 + 229 \cdot E_4 - 37 \cdot F_4 + 12 \cdot G_4 - 3 \cdot H_4 \dots (3)$$

In the above equation,  $h_1$  to  $h_3$  denote the 1/4 sub-pixel values and  $A_x$ - $H_x$  represent the original full-pel pixel values, namely, the pixels from the original image.

Coefficients applied to the above  $A_x-H_x$  are set in a way that the signal processing is performed preventing the occurrence of imaging by upsampling, in other words, unnecessary high frequency components generated through interpolation are eliminated.

The horizontal filtering is illustrated in FIG. 3. Eight-tap filtering is performed based on the pixel values of the original pixels 210 and the pixel values of the three intermediate pixels 220 are calculated in order to obtain a 1/4 sub-pixel accuracy in the horizontal direction.

- 5 -

After the horizontal filtering has been completed, the resulting image data having a full-pel pixel accuracy in the vertical direction and a 1/4 sub-pixel accuracy in the horizontal direction are subjected to vertical filtering. For this purpose, the following Equations (4) to (6) having coefficients which correspond to those of the above described horizontal filter are employed.

$$v_1 = -3 \cdot D_1 + 12 \cdot D_2 - 37 \cdot D_3 + 229 \cdot D_4 + 71 \cdot D_5 - 21 \cdot D_6 + 6 \cdot D_7 - 1 \cdot D_8 \dots (4)$$

$$v_2 = -3 \cdot D_1 + 12 \cdot D_2 - 39 \cdot D_3 + 158 \cdot D_4 + 158 \cdot D_5 - 39 \cdot D_6 + 12 \cdot D_7 - 3 \cdot D_8 \dots (5)$$

$$v_3 = -1 \cdot D_1 + 6 \cdot D_2 - 21 \cdot D_3 + 71 \cdot D_4 + 229 \cdot D_5 - 37 \cdot D_6 + 12 \cdot D_7 - 3 \cdot D_8 \dots (6)$$

In the above equations,  $v_1$  to  $v_3$  refer to the calculated vertical 1/4 sub-pixel values and  $D_1$ ,  $D_2$ ,  $D_3$ ,  $D_4$ ,  $D_5$ ,  $D_6$ ,  $D_7$  and  $D_8$  represent the full-pel resolution pixels , namely, the pixel values of the original pixels 210.

Like in the case described above, coefficients applied to  $D_{\rm x}$  are set in a way that the signal processing is performed preventing the occurrence of imaging by upsampling, in other words, unnecessary high frequency components generated through interpolation are eliminated.

The resulting pixel values have a length of up to 25 bits. In order to obtain image data in each of the pixel values fall into a predefined range of allowable pixel values, the calculation results are downshifted and rounded as illustrated. An example case of pixel value  $v_1$  is shown by the following Equation (7):

25 
$$v_1' = \left(v_1 + \frac{256^2}{2}\right) >> 16...(7)$$

5

10

15

20

Here,  $v_1$  represents the pixel value resulting from the horizontal and vertical filtering, while  $v_1$ ' represents the downshifted pixel value. The downshifted pixel values are further clipped to the

range of 0 to 255.

5

10

15

25

30

The vertical filtering is illustrated in FIG. 4. The pixel values of the pixels 230 obtained during vertical filtering complete the sub-pixel array illustrated by way of filtering example between original pixels  $D_4$ ,  $D_5$ ,  $E_4$  and  $E_5$ .

After having the 1/4 sub-pixel image completed, a 1/8 sub-pixel frame is calculated by applying a bilinear filtering to the 1/4 sub-pixel resolution. In this manner, intermediate pixels are generated between each of the 1/4 resolution pixels.

A bilinear filtering is applied in two steps and is illustrated by way of examples in FIG. 5 and FIG. 6. Starting from the 1/4 sub-pixel resolution, FIG. 5 illustrates the application of a horizontal and vertical filtering. For this purpose, a mean value is calculated from the respective neighbouring pixel values in order to obtain an intermediate pixel value of a 1/8 sub-pixel resolution. When employing a binary representation for this processing, the following Equation (8) can be applied. Note that ">>1" in Equation (8) represents 1-bit downshifting.

20 
$$A = (B + C + 1) >> 1...(8)$$

The remaining 1/8 sub-pixel values to be interpolated are calculated by diagonal filtering as illustrated in FIG. 6. It is a particular advantage of this approach that, in the bilinear filtering, the number of sub-pixel values stemming from multiple filtering is minimized as much as possible. For this purpose, it is preferable that only those pixel values, of the interpolated pixels, that are directly derived from original pixel values 210 are taken into account. In other words, those derived pixel values are the pixel values of the interpolated pixels located between the original pixels.

All intermediate pixel values can be calculated therefrom, in other words, from the pixel values of the original pixels 210 and the

- 7 -

intermediate pixel values derived from the original pixel values, when additionally taking center pixel 240 of the sub-pixel array into account. The calculation operation for the additional 1/8 sub-pixel values is based on two of the 1/4 sub-pixel resolution values, respectively. The individual pixel values taken into account for the calculation of an intermediate pixel value are illustrated in FIG. 6 by respective arrows. Each of the arrows shows two pixel values of pixels based on which each intermediate pixel value of the two is calculated. Depending on the distance of the pixels to be taken into account for interpolation, the following Equations (9) and (10) are employed:

$$D = (E + F + 1) >> 1...(9)$$

$$G = (3H + I + 2) >> 1...(10)$$

5

10

15

20

25

30

In the above equations, D and G represent new intermediate pixel values as illustrated in FIG. 6, and E, F, H and I represent the pixel values obtained from the 1/4 resolution image. The additional values of "1" and "2" in the above equations only serve for correctly rounding the calculation results.

However, the above-described conventional motion compensation method requires to record a long operation value of 25 bits in the filtering process for 1/4 sub-pixel interpolation. This causes a particular disadvantage of such an interpolation approach that long registers are needed resulting in high hardware complexity and computational effort.

The present invention is conceived in view of this drawback. An object of the present invention is to provide a motion compensation method for reducing operational workload and simplifying a hardware configuration.

### **Disclosure of Invention**

In order to achieve the above-described object, the motion compensation method of the present invention interpolating sub-pixels in a reference picture; and performing motion compensation based on the interpolated reference picture, in the method, the interpolating including: a first calculation step of calculating base values which are bases of sub-pixel values of the sub-pixels by multiplying coefficients with pixel values of pixels included in the reference picture; and a first rounding step of deriving the sub-pixel values of the sub-pixels by rounding the base values calculated in the first calculation step instead of directly using the base values in calculating sub-pixel values of other sub-pixels; and the performing of motion compensation includes performing motion compensation based on the reference picture having the interpolated sub-pixels with the correspondingly derived sub-pixel values.

5

10

15

20

25

30

For example, in the conventional method, base values of sub-pixels that have been calculated are directly used in calculating sub-pixel values of other sub-pixels. However, in the present invention, the base values of sub-pixels that have been calculated in the first calculation step are rounded in stead of being directly used in calculating the sub-pixel values of other sub-pixel values. Therefore, even in the case where the sub-pixel values of the other sub-pixels are calculated using the base values rounded, the number of bits to be used in the calculation can be more reduced than in the conventional way. As a result, it becomes possible to reduce the operational workload and to simplify the hardware configuration.

Also, in a first aspect of the present invention, in the motion compensation method, the first calculation step may include calculating base values of sub-pixels to be interpolated in a first direction, and the first rounding step may include deriving sub-pixel values of the sub-pixels to be interpolated in the first direction by

rounding the base values calculated in the first calculation step. At this time, in a second aspect of the present invention, in the motion compensation method, the interpolation may further include: a second calculation step of calculating, using the sub-pixel values of the sub-pixels derived in the first rounding step, base values of sub-pixels to be interpolated in a second direction that is different from the first direction; and a second rounding step of deriving the sub-pixel values of the sub-pixels to be interpolated in the second direction by rounding the base values calculated in the second calculation step.

5

10

15

20

25

30

In this way, in the process of calculating sub-pixel values of sub-pixels to be interpolated in the first direction and in the second direction, the number of bits to be used in the calculation can be reduced down to 16 bits from, for example, 25 bits needed in a conventional way.

Also, a fourth aspect of the present invention, in the motion compensation method, the first calculation step may include calculating the base values of three a-fourths sub-pixels using the following equations when eight pixel values of pixels arrayed in the first direction are represented as A, B, C, D, E, F, G and H respectively and the three a-fourths sub-pixel values are represented as  $h_1$ ,  $h_2$  and  $h_3$  respectively:

$$\begin{aligned} h_1 &= -1 \cdot A + 3 \cdot B - 10 \cdot C + 59 \cdot D + 18 \cdot E - 6 \cdot F + 1 \cdot G - 0 \cdot H; \\ h_2 &= -1 \cdot A + 4 \cdot B - 10 \cdot C + 39 \cdot D + 39 \cdot E - 10 \cdot F + 4 \cdot G - 1 \cdot H; \\ and \\ h_3 &= -0 \cdot A + 1 \cdot B - 6 \cdot C + 18 \cdot D + 59 \cdot E - 10 \cdot F + 3 \cdot G - 1 \cdot H. \end{aligned}$$

Here, in a fifth aspect of the present invention, in the motion compensation method, the second calculation step may include calculating the base values of three a-fourths sub-pixels using the following equations when eight pixel values of pixels arrayed in the second direction are represented as  $D_1$ ,  $D_2$ ,  $D_3$ ,  $D_4$ ,  $D_5$ ,  $D_6$ ,  $D_7$  and  $D_8$  respectively and the three a-fourths sub-pixel values are represented as  $v_1$ ,  $v_2$  and  $v_3$  respectively:

$$\begin{aligned} v_1 &= -3 \cdot D_1 + 12 \cdot D_2 - 37 \cdot D_3 + 229 \cdot D_4 + 71 \cdot D_5 - 21 \cdot D_6 + 6 \cdot D_7 - 1 \cdot D_8 \,; \\ v_2 &= -3 \cdot D_1 + 12 \cdot D_2 - 39 \cdot D_3 + 158 \cdot D_4 + 158 \cdot D_5 - 39 \cdot D_6 + 12 \cdot D_7 - 3 \cdot D_8 \,; \\ and \\ v_3 &= -1 \cdot D_1 + 6 \cdot D_2 - 21 \cdot D_3 + 71 \cdot D_4 + 229 \cdot D_5 - 37 \cdot D_6 + 12 \cdot D_7 - 3 \cdot D_8 \,. \end{aligned}$$

In this way, the coefficients used in calculating sub-pixel values of sub-pixels are smaller than the conventional coefficients. This makes it possible to further reduce the number of bits to be used in calculating the sub-pixel values.

Also, in the fourth aspect of the present invention, the motion compensation method may further include a bilinear filtering of raising a sub-pixel accuracy by applying bilinear filtering to the reference picture having the interpolated sub-pixels with the correspondingly derived sub-pixel values.

In this way, the increase in sub-pixel accuracy makes it possible to prevent picture quality from deteriorating during the picture coding processing and the picture decoding processing.

Note that the present invention can be realized as a motion compensation method, a motion estimation method, a moving picture coding method and a moving picture decoding method using the motion compensation method, a program causing a computer to execute these steps of the respective methods, a recording medium for storing the program, and an apparatus for performing operations according to these methods.

# Further Information about Technical Background to this Application

The disclosure of EP Application No. 04016437.8 filed on July 13th, 2004 including specification, drawings and claims is incorporated herein by reference in its entirety.

## **Brief Description of Drawings**

5

10

15

20

25

30

These and other objects, advantages and features of the invention will become apparent from the following description

thereof taken in conjunction with the accompanying drawings that illustrate a specific embodiment of the invention. In the Drawings:

- FIG. 1 is a block diagram showing the structure of a moving picture coding apparatus;
- FIG. 2 is an illustration of how a video image is divided into blocks;

5

10

15

20

25

30

- FIG. 3 is an illustration of horizontal filtering for calculating a 1/4 sub-pixel accuracy in the horizontal direction;
- FIG. 4 is an illustration of vertical filtering for calculating a 1/4 sub-pixel accuracy in the vertical direction;
  - FIG. 5 is an illustration of horizontal and vertical filtering for calculating a 1/8 sub-pixel accuracy;
  - FIG. 6 is an illustration of bilinear filtering in the diagonal direction for calculating a 1/8 sub-pixel accuracy;
  - FIG. 7 is a block diagram showing the configuration of a moving picture coding apparatus in the embodiment of the present invention;
  - FIG. 8 is a flow chart showing the motion compensation operation performed by the moving picture coding apparatus in the embodiment;
  - FIG. 9 is a comparison graph illustrating the difference between a coding result of a first image in the present invention and a coding result of another image obtained using a conventional method;
  - FIG. 10 is a comparison graph illustrating the difference between a coding result of a second image in the present invention and a coding result of another image obtained using a conventional method;
  - FIG. 11 is a block diagram showing the structure of a moving picture decoding apparatus in the embodiment of the present invention; and
    - FIG. 12 is an illustration of an interpolation method

concerning the variation of the embodiment.

## **Best Mode for Carrying Out the Invention**

5

10

15

20

25

30

A moving picture coding apparatus and a moving picture decoding apparatus in the embodiment of the present invention will be described below with reference to figures.

In video coding, the coding efficiency is increased by applying motion estimation and motion compensation in predictive coding. The motion estimation and compensation can be improved by reducing the differential remaining between the image data to be coded and the predictive image data. In particular, a 1/8 sub-pixel motion vector accuracy can further improve the coding efficiency.

The present invention achieves an improved motion estimation and compensation without increasing the hardware complexity and the computational effort accordingly. This is because the present invention enables to only employ a 16-bit accuracy of intermediate calculation results for this purpose.

FIG. 7 is a block diagram showing the configuration of the moving picture coding apparatus in this embodiment.

This moving picture coding apparatus 100 includes: a substractor 110; a transform/quantization unit 120; an inverse quantization/inverse transform unit 130; an adder 135; a deblocking filter 137; a memory 140; a 16-bit operation interpolation filter 150; a motion compensation/prediction unit 160; a motion estimation unit 170; and an entropy coding unit 190.

The subtractor 110 subtracts a prediction signal PS from an input signal IS indicating a moving picture and outputs the differential to the transform/quantization unit 120.

The transform/quantization unit 120 obtains the differential from the subtractor 110 and performs coding processing of frequency transform (such as DCT transform) and quantization using the differential. After that, the transform/quantization unit

120 outputs the quantized transform coefficient QC that is the processing result to the entropy coding unit 190 and the inverse quantization/inverse transform unit 130.

The inverse quantization/inverse transform unit 130 performs decoding processing of inverse quantization and inverse DCT transform using the quantized transform coefficient QC outputted from the transform/quantization unit 120. After that the inverse quantization/inverse transform unit 130 outputs the differential decoding signal DDS that is the processing result to the adder 135.

5

10

15

20

25

30

The adder 135 adds the differential decoding signal DDS to the prediction signal PS obtained from the motion compensation prediction unit 160, and outputs the picture obtained as the result to the deblocking filter 137.

The deblocking filter 137 removes the block distortion of the picture outputted from the adder 135, and stores the picture with no block distortion in the memory 140 as a reference picture.

The 16-bit operation interpolation filter 150 extracts a reference picture from the memory 140 and performs 1/8 sub-pixel interpolation of the reference picture.

The motion estimation unit 170 estimates a motion vector based on the picture indicated by the input signal IS and the reference picture on which 1/8 sub-pixel interpolation has been performed using the 16-bit operation interpolation filter 150. After that, the motion estimation unit 170 outputs the motion data MD indicating the detected motion vector to the motion compensation/prediction unit 160 and the entropy coding unit 190.

The motion compensation/prediction unit 160 performs motion compensation based on the motion vector indicated by the motion data MD and the reference picture on which 1/8 sub-pixel interpolation has been performed using the 16-bit operation interpolation filter 150. In this way, the motion compensation/prediction unit 160 predicts the current picture

indicated by the input signal IS and outputs the prediction signal PS indicating the prediction picture to the subtractor 110.

The entropy coding unit 190 performs entropy coding of the quantized transform coefficients QC outputted by the transform/quantization unit 120 and the motion data MD outputted by the motion estimation unit 170, and outputs the result as a bit stream BS.

The moving picture coding apparatus 100 in the embodiment like this has a feature of including a 16-bit operation interpolation filter 150. In other words, the motion compensation method in this embodiment has a feature that motion compensation is performed using the 1/8 sub-pixel interpolation by this 16-bit operation interpolation filter 150.

Note that, in the moving picture coding apparatus 100 in this embodiment, the respective functional units other than the 16-bit operation interpolation filter 150 have the same functions as the respective functional units included in the above-described conventional moving picture coding apparatus.

The 16-bit operation interpolation filter 150 calculates a 1/4 sub-pixel value using a method other than a conventional method, and then calculates 1/8 sub-pixel value using the 1/4 sub-pixel value like in the case of the conventional method. The method how this 16-bit operation interpolation filter 150 calculates 1/4 sub-pixel value will be described in detail.

A two-step procedure is employed for obtaining the 1/8 pixel accuracy. In a first stage including two interpolation steps, a horizontal and a vertical filtering is subsequently employed. For interpolating 1/4 sub-pixel values in the horizontal direction, the following Equations (11) to (13) are applied:

5

10

15

20

25

$$h_{1} = -1 \cdot A_{h} + 3 \cdot B_{h} - 10 \cdot C_{h} + 59 \cdot D_{h} + 18 \cdot E_{h} - 6 \cdot F_{h} + 1 \cdot G_{h} - 0 \cdot H_{h} \dots (11)$$

$$h_{2} = -1 \cdot A_{h} + 4 \cdot B_{h} - 10 \cdot C_{h} + 39 \cdot D_{h} + 39 \cdot E_{h} - 10 \cdot F_{h} + 4 \cdot G_{h} - 1 \cdot H_{h} \dots (12)$$

$$h_{3} = -0 \cdot A_{h} + 1 \cdot B_{h} - 6 \cdot C_{h} + 18 \cdot D_{h} + 59 \cdot E_{h} - 10 \cdot F_{h} + 3 \cdot G_{h} - 1 \cdot H_{h} \dots (13)$$

In the above equations,  $h_1$  to  $h_3$  represent the 1/4 sub-pixel values to be interpolated, and  $A_x$ - $H_x$  represent the original full-pel pixel values.

Here, the respective coefficients of  $A_x$ - $H_x$  in this embodiment are set so that unnecessary high frequency components generated through interpolation are eliminated like in the conventional method. More specifically, the coefficients are set smaller than the conventional coefficients under the condition that picture quality does not deteriorate in the coding and decoding processing. In other words, the respective coefficients in this embodiment are set smaller in proportion to the respective coefficients of the conventional Equations (1) to (3).

After completing the horizontal filtering, the calculated values are rounded by being downshifted. For example, the intermediate value of  $h_1$  is rounded using the following Equation (14).

$$h_1' = \left(h_1 + \frac{64}{2}\right) >> 6...(14)$$

20

25

5

10

15

Here,  $h_1$  represents the interpolated pixel value resulting from horizontal filtering, and  $h_1$ ' represents the respectively downshifted pixel value. A corresponding processing is applied to all of the interpolated pixel values resulting from horizontal filtering. Note that ">>6" in the Equation (14) represents 6-bit downshifting.

In the second step of the first stage, the horizontally increased sub-pixel accuracy is also obtained in the vertical direction. For this purpose, a vertical filtering is applied. The previously performed downshift operation provides that none of the

intermediate calculations exceeds a 16-bit accuracy in the vertical filtering step. The vertical filtering is performed by employing the filter coefficients as shown in the following Equations (15) to (17) which correspond to Equations (11) to (13) in the case of the horizontal filtering:

$$v_{1} = -1 \cdot D_{\nu-3} + 3 \cdot D_{\nu-2} - 10 \cdot D_{\nu-1} + 59 \cdot D_{\nu} + 18 \cdot D_{\nu+1} - 6 \cdot D_{\nu+2} + 1 \cdot D_{\nu+3} - 0 \cdot D_{\nu+4} \dots (15)$$

$$v_{2} = -1 \cdot D_{\nu-3} + 4 \cdot D_{\nu-2} - 10 \cdot D_{\nu-1} + 39 \cdot D_{\nu} + 39 \cdot D_{\nu+1} - 10 \cdot D_{\nu+2} + 4 \cdot D_{\nu+3} - 1 \cdot D_{\nu+4} \dots (16)$$

$$v_{3} = -0 \cdot D_{\nu-3} + 1 \cdot D_{\nu-2} - 6 \cdot D_{\nu-1} + 18 \cdot D_{\nu} + 59 \cdot D_{\nu+1} - 10 \cdot D_{\nu+2} + 3 \cdot D_{\nu+3} - 1 \cdot D_{\nu+4} \dots (17)$$

Here,  $v_1$  to  $v_3$  refer to the 1/4 sub-pixel values in the vertical direction and  $D_{v-3}$ ,  $D_{v-2}$ ,  $D_{v-1}$ ,  $D_v$ ,  $D_{v+1}$ ,  $D_{v+2}$ ,  $D_{v+3}$  and  $D_{v+4}$ , represent the full-pel pixels in the vertical direction. In other words, the full-pel pixels are pixels 210 and 220 from FIG. 3.

Here, the respective coefficients of  $D_x$  ( $D_{v-3}$  to  $D_{v+4}$ ) in this embodiment are set smaller in proportion to the respective coefficients of the conventional Equations (4) to (6) like in the case of the respective coefficients of the above  $A_x-H_x$ .

The calculation results from the vertical filtering, namely, pixel values 230, are subjected to downshifting by applying the following Equation (18) which is illustrated as an example case of  $v_1$  only:

$$v_1' = \left(v_1 + \frac{64}{2}\right) >> 6...(18)$$

5

10

15

20

25

A rounding during the downshift operation is achieved by adding the value  $2^6/2=64/2$  to the interpolated pixel value.

Although, the above description firstly applies a horizontal filtering and secondly a vertical filtering together with respective downshift operations, a skilled person in the art is aware that the horizontal and vertical operations may be exchanged to achieve the same result. Thus the vertical filtering may be performed before a

horizontal filtering.

5

10

15

20

25

30

The finally obtained sub-pixel values with a 1/4 sub-pixel accuracy are clipped in order to fall within a range between 0 and 255.

The obtained 1/4 sub-pixel values are subjected to a bilinear filtering as it has been described above in connection with FIG. 5 and FIG. 6 in order to obtain a 1/8 sub-pixel resolution.

The following example demonstrates that the processing of the present invention does not require any registers for intermediate pixel values exceeding a 16-bit accuracy.

Assuming that a pixel value falls in the range between 0 and 255, the largest possible values during a horizontal 8-tap filtering may occur when employing the following Equation (19) for calculating intermediate pixel value  $h_2$ :

 $\begin{aligned} h_2 &= -1 \cdot 0 + 4 \cdot 255 + \left(-10\right) \cdot 0 + 39 \cdot 255 + 39 \cdot 255 + \left(-10\right) \cdot 0 + 4 \cdot 255 - 1 \cdot 0 \dots (19) \\ h_2 &= 21930 < 32768 = 2^{15} \Rightarrow 15bit + 1bit(sign) \dots (20) \end{aligned}$ 

In this way, this embodiment can eliminate the necessity of performing the calculation over 16 bits in the calculation processing of 1/4 sub-pixel values.

The resulting pixel value is downshifted as indicated by the following Equation (21):

$$\left(21930 + \frac{64}{2}\right) >> 6 = 343...(21)$$

The result of the downshift operation is clipped to the range of 0 to 255.

As demonstrated above, the required pixel accuracy for the largest possible values during the filtering operation does not exceed 16-bits. Although the above operation example has only

been calculated for the horizontal direction, corresponding coefficients are used for the vertical filtering and, thus, identical advantages are applied to the vertical filtering.

The above example only relates to the 1/4 sub-pixel resolution calculation. The bilinear filtering for generating a 1/8 sub-pixel resolution only requires a maximum accuracy of 10-bits. Thus, a maximum accuracy of 16-bits is sufficient for performing all calculations of the present invention. Accordingly, the motion estimation, motion compensation and the coding and decoding of data moving picture can be improved in a simple manner.

5

10

15

20

25

30

FIG. 8 is a flow chart showing the motion compensation operation performed by the moving picture coding apparatus 100 in the embodiment.

First, the 16-bit operation interpolation filter 150 of the moving picture coding apparatus 100 calculates 1/4 sub-pixel values (base values which are bases of sub-pixel values) of the reference picture extracted from the memory 140 in the horizontal direction (S100). After that, the 16-bit operation interpolation filter 150 performs downshifting of the pixel values obtained in Step 100, and rounds the pixel values (Step 102).

Next, the 16-bit operation interpolation filter 150 calculates 1/4 sub-pixel values in the vertical direction using the pixel values rounded in Step 102 (Step 104). After that, the 16-bit operation interpolation filter 150 performs downshifting of the pixel values obtained in Step 104 and rounds the pixel values (Step 106).

Through the operation of Step 100 to Step 106 like this, 1/4 sub-pixels of the reference picture are interpolated in the horizontal direction and the vertical direction.

When 1/4 sub-pixels are interpolated, the 16-bit operation interpolation filter 150 calculates 1/8 sub-pixels by performing bilinear filtering using the interpolated 1/4 sub-pixels like in the conventional case, in other words, the 16-bit operation interpolation

filter 150 raises the pixel accuracy of the reference picture from 1/4 sub-pixel accuracy to 1/8 sub-pixel accuracy (Step 108).

Through Step 100 to Step 108 performed by the 16-bit operation interpolation filter 150 like this, a reference picture with interpolated 1/8 sub-pixel values is generated.

5

10

15

20

25

30

After that, the motion compensation/prediction unit 160 performs motion compensation using the reference picture with interpolated 1/8 sub-pixels and outputs the prediction signal PS indicating the result (Step 110).

For demonstrating that similar results compared to conventional interpolation implementations can be achieved when applying the present invention, the algorithm of the present invention has been implemented into the H. 264/MPEG encoder processing software (JM4.2). The calculation results are illustrated in FIG. 9 and FIG. 10 by rate distortion curves indicating the impact on the perceived picture quality. Both figures differ only because different image sequences are employed as examples.

The rate distortion curves of FIG. 9 and FIG. 10 are shown over the bit rate on the X-axis and the peak signal to noise ratio (PSNR) on the Y-axis representing a measure for the introduced distortions.

FIG. 9 and FIG. 10 demonstrate that the 16-bit implementation of a 1/8 sub-pixel filter (1/8-pel 16 bit) does not result in an image quality degradation compared to the conventional JM4.2 algorithm (1/8-pel 25-bit) although the JM4.2 algorithm requires longer registers. In addition, the approach of the present invention actually performs better than 1/4 sub-pixel 20-bit coding (1/4-pel 20 bit).

FIG. 11 is a block diagram showing the configuration of a moving picture decoding apparatus in the embodiment of the present invention.

This moving picture decoding apparatus 300 includes: an

entropy decoding unit 310; an inverse quantization/inverse transform unit 320; an adder 330; a deblocking filter 340; a memory 350 and a motion compensation unit 360.

The entropy decoding unit 310 obtains a bit stream BS outputted by the moving picture coding apparatus 100 and performs entropy decoding processing of the bit stream. As the result, the entropy decoding unit 310 outputs the quantized transform coefficients QC to the inverse quantization/inverse transform unit 320 and outputs the motion data MD indicating the motion vector to the motion compensation unit 360.

5

10

15

20

25

30

The inverse quantization/inverse transform unit 320 performs decoding processing of inverse quantization and inverse DCT transform using the quantized transform coefficients QC. After that, the inverse quantization/inverse transform unit 320 outputs the differential decoding signal DDS that is the result of the processing to the adder 330.

The adder 330 adds the differential decoding signal DDS to the prediction signal PS obtained from the motion compensation unit 360, and outputs the resulting picture to the deblocking filter 340.

The deblocking filter 340 eliminates the block distortion of the picture outputted from the adder 330, and stores the picture with no block distortion to the memory 350. The decoded picture is extracted from the memory 350 as the output signal OS.

The motion compensation unit 360 includes: a 16-bit operation interpolation filter 361 for extracting the picture stored in the memory 350 as a reference picture and performing 1/8 sub-pixel interpolation of the reference picture; and a motion compensation prediction unit 361 for predicting the current picture. This motion compensation prediction unit 361 performs motion compensation based on the motion vector indicated by the motion data MD and the reference picture on which 1/8 sub-pixel interpolation is performed using a 16-bit operation interpolation filter 361. In this way, the

motion compensation/prediction unit 361 predicts the current picture and outputs the prediction signal PS indicating the prediction picture to the adder 330.

The moving picture decoding apparatus 300 like this also has a feature of including a 16-bit operation interpolation filter 361 like in the case of the moving picture coding apparatus 100. This 16-bit operation interpolation filter 361 has the same function as the 16-bit operation interpolation filter 150 of the moving picture coding apparatus 100. Therefore, with this moving picture decoding apparatus 300, it is possible to reduce operation workload and simplify a hardware configuration without using pixel values exceeding 16 bits in the process of calculating the pixel values.

Summarizing, the present invention provides an improved motion estimation and compensation by only employing a simplified hardware configuration and less computational effort. This is achieved by employing particular filter coefficients and additional downshift operations when obtaining a 1/4 sub-pixel resolution image. Accordingly, a more efficient coding and decoding with a simpler hardware configuration can be achieved.

20

25

30

5

10

15

## (Variation)

Here, an variation of the method for interpolating 1/4 sub-pixel values in the embodiment will be described below.

In the above-described embodiment, a two-step interpolation is performed in the following way: 1/4 sub-pixel values are interpolated in the horizontal direction; and then other 1/4 sub-pixel values are interpolated in the vertical direction. However, a single-step interpolation is performed instead of the two-step interpolation in this variation, the single-step interpolation being able to achieve the same effect as the effect obtained through both the interpolation in the horizontal direction and the vertical direction. In other words, the 16-bit operation interpolation filter 150 of this

variation has a function as a two-dimensional filter.

FIG. 12 is an illustration of an interpolation method concerning the variation of the embodiment.

In this FIG. 12, white circles show pixels of full pixel unit that are present in a reference picture, and the pixel values of the pixels that are present in the horizontal position h and the vertical position v are represented as  $P_{h,v}$ . Also, the number of taps of the two-dimensional filter is 36 (6 taps in both the horizontal direction and the vertical direction).

In this case, the 16-bit operation interpolation filter 150 calculates pixel values  $P_{hv, ij}$  (i, j = 0 to 3, excluding "i=0 and j=0") of sub-pixels to be interpolated using the following Equation (22). Here,  $c_{ij}$  (m, n) is a filter coefficients (m, n = -2 to 3) and generally vary depending on the position (i, j) of the pixel to be interpolated. After that, the sub-pixel values calculated in this way are downshifted.

$$p_{h\nu,ij} = \sum_{m=-2}^{3} \sum_{n=-2}^{3} c_{ij}(m,n) P_{h+m,\nu-n} \dots (22)$$

5

10

15

20

25

30

In this variation like the case of the conventional example, those calculated sub-pixel values are always rounded and not used for calculating the pixel values of other sub-pixels. Thus, it is possible to reduce the number of bits necessary for the calculation process of sub-pixels.

Although only an exemplary embodiment of this invention has been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiment without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention.

## **Industrial Applicability**

The motion compensation method concerning the present invention provides the following two effects that: operation workload can be reduced; and a hardware configuration can be simplified. For example, the motion compensation method can be applied for a moving picture coding apparatus for coding a moving picture, a moving picture decoding apparatus for decoding the coded moving picture, and the like.